智能论文笔记

Neural Surface Reconstruction of Dynamic Scenes with Monocular RGB-D Camera

Hongrui Cai , Wanquan Feng , Xuetao Feng , Yan Wang , Juyong Zhang

分类：计算机视觉

2022-06-30

我们提出了一种神经动力构造（NDR），这是一种无模板的方法，可从单眼RGB-D摄像机中恢复动态场景的高保真几何形状和动作。在NDR中，我们采用神经隐式函数进行表面表示和渲染，使捕获的颜色和深度可以完全利用以共同优化表面和变形。为了表示和限制非刚性变形，我们提出了一种新型的神经可逆变形网络，以便自动满足任意两个帧之间的循环一致性。考虑到动态场景的表面拓扑可能会随着时间的流逝而发生变化，我们采用一种拓扑感知的策略来构建融合框架的拓扑变化对应关系。NDR还以全球优化的方式进一步完善了相机的姿势。公共数据集和我们收集的数据集的实验表明，NDR的表现优于现有的单眼动态重建方法。

translated by 谷歌翻译

TwiBot-22: Towards Graph-Based Twitter Bot Detection

Shangbin Feng , Zhaoxuan Tan , Herun Wan , Ningnan Wang , Zilong Chen , Binchi Zhang , Qinghua Zheng , Wenqian Zhang , Zhenyu Lei , Shujie Yang

分类：人工智能

2022-06-09

Twitter机器人检测已成为打击错误信息，促进社交媒体节制并保持在线话语的完整性的越来越重要的任务。最先进的机器人检测方法通常利用Twitter网络的图形结构，在面对传统方法无法检测到的新型Twitter机器人时，它们表现出令人鼓舞的性能。但是，现有的Twitter机器人检测数据集很少是基于图形的，即使这些基于图形的数据集也遭受有限的数据集量表，不完整的图形结构以及低注释质量。实际上，缺乏解决这些问题的大规模基于图的Twitter机器人检测基准，严重阻碍了基于图形的机器人检测方法的开发和评估。在本文中，我们提出了Twibot-22，这是一个综合基于图的Twitter机器人检测基准，它显示了迄今为止最大的数据集，在Twitter网络上提供了多元化的实体和关系，并且与现有数据集相比具有更好的注释质量。此外，我们重新实施35代表性的Twitter机器人检测基线，并在包括Twibot-22在内的9个数据集上进行评估，以促进对模型性能和对研究进度的整体了解的公平比较。为了促进进一步的研究，我们将所有实施的代码和数据集巩固到Twibot-22评估框架中，研究人员可以在其中始终如一地评估新的模型和数据集。 Twibot-22 Twitter机器人检测基准和评估框架可在https://twibot22.github.io/上公开获得。

translated by 谷歌翻译

Neural Points: Point Cloud Representation with Neural Fields

Wanquan Feng , Jin Li , Hongrui Cai , Xiaonan Luo , Juyong Zhang

分类：计算机视觉

2021-12-08

在本文中，我们提出了一种新的点云表示。与传统点云表示不同，其中每个点仅表示3D空间中的位置或局部平面，神经点中的每个点通过神经领域表示局部连续几何形状。因此，神经点可以表达更复杂的细节，因此具有更强的表示能力。具有含有丰富的几何细节的高分辨率表面培训神经点，使得训练模型具有足够的各种形状的表达能力。具体地，我们通过2D参数域和3D本地补丁之间的局部同构来提取点上的深度局部特征并通过局部同构构造神经字段。在决赛中，局部神经领域集成在一起以形成全局表面。实验结果表明，神经点具有强大的代表能力，展示了优异的鲁棒性和泛化能力。通过神经点，我们可以用任意分辨率重新采样点云，并优于最先进的点云上采样方法，通过大边距。

translated by 谷歌翻译

Rethinking Rotation Invariance with Point Cloud Registration

Jianhui Yu , Chaoyi Zhang , Weidong Cai

分类：计算机视觉

2022-12-31

Recent investigations on rotation invariance for 3D point clouds have been devoted to devising rotation-invariant feature descriptors or learning canonical spaces where objects are semantically aligned. Examinations of learning frameworks for invariance have seldom been looked into. In this work, we review rotation invariance in terms of point cloud registration and propose an effective framework for rotation invariance learning via three sequential stages, namely rotation-invariant shape encoding, aligned feature integration, and deep feature registration. We first encode shape descriptors constructed with respect to reference frames defined over different scales, e.g., local patches and global topology, to generate rotation-invariant latent shape codes. Within the integration stage, we propose Aligned Integration Transformer to produce a discriminative feature representation by integrating point-wise self- and cross-relations established within the shape codes. Meanwhile, we adopt rigid transformations between reference frames to align the shape codes for feature consistency across different scales. Finally, the deep integrated feature is registered to both rotation-invariant shape codes to maximize feature similarities, such that rotation invariance of the integrated feature is preserved and shared semantic information is implicitly extracted from shape codes. Experimental results on 3D shape classification, part segmentation, and retrieval tasks prove the feasibility of our work. Our project page is released at: https://rotation3d.github.io/.

translated by 谷歌翻译

An Analysis of Attention via the Lens of Exchangeability and Latent Variable Models

Yufeng Zhang , Boyi Liu , Qi Cai , Lingxiao Wang , Zhaoran Wang

分类：机器学习

2022-12-30

With the attention mechanism, transformers achieve significant empirical successes. Despite the intuitive understanding that transformers perform relational inference over long sequences to produce desirable representations, we lack a rigorous theory on how the attention mechanism achieves it. In particular, several intriguing questions remain open: (a) What makes a desirable representation? (b) How does the attention mechanism infer the desirable representation within the forward pass? (c) How does a pretraining procedure learn to infer the desirable representation through the backward pass? We observe that, as is the case in BERT and ViT, input tokens are often exchangeable since they already include positional encodings. The notion of exchangeability induces a latent variable model that is invariant to input sizes, which enables our theoretical analysis. - To answer (a) on representation, we establish the existence of a sufficient and minimal representation of input tokens. In particular, such a representation instantiates the posterior distribution of the latent variable given input tokens, which plays a central role in predicting output labels and solving downstream tasks. - To answer (b) on inference, we prove that attention with the desired parameter infers the latent posterior up to an approximation error, which is decreasing in input sizes. In detail, we quantify how attention approximates the conditional mean of the value given the key, which characterizes how it performs relational inference over long sequences. - To answer (c) on learning, we prove that both supervised and self-supervised objectives allow empirical risk minimization to learn the desired parameter up to a generalization error, which is independent of input sizes. Particularly, in the self-supervised setting, we identify a condition number that is pivotal to solving downstream tasks.

translated by 谷歌翻译

Heterogeneous Synthetic Learner for Panel Data

Ye Shen , Runzhe Wan , Hengrui Cai , Rui Song

分类： (统计)机器学习 | 机器学习

2022-12-30

In the new era of personalization, learning the heterogeneous treatment effect (HTE) becomes an inevitable trend with numerous applications. Yet, most existing HTE estimation methods focus on independently and identically distributed observations and cannot handle the non-stationarity and temporal dependency in the common panel data setting. The treatment evaluators developed for panel data, on the other hand, typically ignore the individualized information. To fill the gap, in this paper, we initialize the study of HTE estimation in panel data. Under different assumptions for HTE identifiability, we propose the corresponding heterogeneous one-side and two-side synthetic learner, namely H1SL and H2SL, by leveraging the state-of-the-art HTE estimator for non-panel data and generalizing the synthetic control method that allows flexible data generating process. We establish the convergence rates of the proposed estimators. The superior performance of the proposed methods over existing ones is demonstrated by extensive numerical studies.

translated by 谷歌翻译

Feature Selection Approaches for Optimising Music Emotion Recognition Methods

Le Cai , Sam Ferguson , Haiyan Lu , Gengfa Fang

分类：机器学习

2022-12-27

The high feature dimensionality is a challenge in music emotion recognition. There is no common consensus on a relation between audio features and emotion. The MER system uses all available features to recognize emotion; however, this is not an optimal solution since it contains irrelevant data acting as noise. In this paper, we introduce a feature selection approach to eliminate redundant features for MER. We created a Selected Feature Set (SFS) based on the feature selection algorithm (FSA) and benchmarked it by training with two models, Support Vector Regression (SVR) and Random Forest (RF) and comparing them against with using the Complete Feature Set (CFS). The result indicates that the performance of MER has improved for both Random Forest (RF) and Support Vector Regression (SVR) models by using SFS. We found using FSA can improve performance in all scenarios, and it has potential benefits for model efficiency and stability for MER task.

translated by 谷歌翻译

Data-Driven Linear Complexity Low-Rank Approximation of General Kernel Matrices: A Geometric Approach

Difeng Cai , Edmond Chow , Yuanzhe Xi

分类：机器学习

2022-12-24

A general, {\em rectangular} kernel matrix may be defined as $K_{ij} = \kappa(x_i,y_j)$ where $\kappa(x,y)$ is a kernel function and where $X=\{x_i\}_{i=1}^m$ and $Y=\{y_i\}_{i=1}^n$ are two sets of points. In this paper, we seek a low-rank approximation to a kernel matrix where the sets of points $X$ and $Y$ are large and are not well-separated (e.g., the points in $X$ and $Y$ may be ``intermingled''). Such rectangular kernel matrices may arise, for example, in Gaussian process regression where $X$ corresponds to the training data and $Y$ corresponds to the test data. In this case, the points are often high-dimensional. Since the point sets are large, we must exploit the fact that the matrix arises from a kernel function, and avoid forming the matrix, and thus ruling out most algebraic techniques. In particular, we seek methods that can scale linearly, i.e., with computational complexity $O(m)$ or $O(n)$ for a fixed accuracy or rank. The main idea in this paper is to {\em geometrically} select appropriate subsets of points to construct a low rank approximation. An analysis in this paper guides how this selection should be performed.

translated by 谷歌翻译

Reversible Column Networks

Yuxuan Cai , Yizhuang Zhou , Qi Han , Jianjian Sun , Xiangwen Kong , Jun Li , Xiangyu Zhang

分类：计算机视觉

2022-12-22

We propose a new neural network design paradigm Reversible Column Network (RevCol). The main body of RevCol is composed of multiple copies of subnetworks, named columns respectively, between which multi-level reversible connections are employed. Such architectural scheme attributes RevCol very different behavior from conventional networks: during forward propagation, features in RevCol are learned to be gradually disentangled when passing through each column, whose total information is maintained rather than compressed or discarded as other network does. Our experiments suggest that CNN-style RevCol models can achieve very competitive performances on multiple computer vision tasks such as image classification, object detection and semantic segmentation, especially with large parameter budget and large dataset. For example, after ImageNet-22K pre-training, RevCol-XL obtains 88.2% ImageNet-1K accuracy. Given more pre-training data, our largest model RevCol-H reaches 90.0% on ImageNet-1K, 63.8% APbox on COCO detection minival set, 61.0% mIoU on ADE20k segmentation. To our knowledge, it is the best COCO detection and ADE20k segmentation result among pure (static) CNN models. Moreover, as a general macro architecture fashion, RevCol can also be introduced into transformers or other neural networks, which is demonstrated to improve the performances in both computer vision and NLP tasks. We release code and models at https://github.com/megvii-research/RevCol

translated by 谷歌翻译

Trajectory Generation and Tracking Control for Aggressive Tail-Sitter Flights

Guozheng Lu , Yixi Cai , Nan Chen , Fanze Kong , Yunfan Ren , Fu Zhang

分类：机器人

2022-12-22

We address the theoretical and practical problems related to the trajectory generation and tracking control of tail-sitter UAVs. Theoretically, we focus on the differential flatness property with full exploitation of actual UAV aerodynamic models, which lays a foundation for generating dynamically feasible trajectory and achieving high-performance tracking control. We have found that a tail-sitter is differentially flat with accurate aerodynamic models within the entire flight envelope, by specifying coordinate flight condition and choosing the vehicle position as the flat output. This fundamental property allows us to fully exploit the high-fidelity aerodynamic models in the trajectory planning and tracking control to achieve accurate tail-sitter flights. Particularly, an optimization-based trajectory planner for tail-sitters is proposed to design high-quality, smooth trajectories with consideration of kinodynamic constraints, singularity-free constraints and actuator saturation. The planned trajectory of flat output is transformed to state trajectory in real-time with consideration of wind in environments. To track the state trajectory, a global, singularity-free, and minimally-parameterized on-manifold MPC is developed, which fully leverages the accurate aerodynamic model to achieve high-accuracy trajectory tracking within the whole flight envelope. The effectiveness of the proposed framework is demonstrated through extensive real-world experiments in both indoor and outdoor field tests, including agile SE(3) flight through consecutive narrow windows requiring specific attitude and with speed up to 10m/s, typical tail-sitter maneuvers (transition, level flight and loiter) with speed up to 20m/s, and extremely aggressive aerobatic maneuvers (Wingover, Loop, Vertical Eight and Cuban Eight) with acceleration up to 2.5g.

translated by 谷歌翻译